228 research outputs found

    Simple regret for infinitely many armed bandits

    Get PDF
    We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous algorithms for this setting were designed for minimizing the cumulative regret of the learner. In this paper, we propose an algorithm aiming at minimizing the simple regret. As in the cumulative regret setting of infinitely many armed bandits, the rate of the simple regret will depend on a parameter β\beta characterizing the distribution of the near-optimal arms. We prove that depending on β\beta, our algorithm is minimax optimal either up to a multiplicative constant or up to a log(n)\log(n) factor. We also provide extensions to several important cases: when β\beta is unknown, in a natural setting where the near-optimal arms have a small variance, and in the case of unknown time horizon.Comment: in 32th International Conference on Machine Learning (ICML 2015

    Second-Order Kernel Online Convex Optimization with Adaptive Sketching

    Get PDF
    Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only O(t)\mathcal{O}(t) time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal O(T)\mathcal{O}(\sqrt{T}) regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss posses stronger curvature that can be exploited. In this case, second-order KOCO methods achieve O(log(Det(K)))\mathcal{O}(\log(\text{Det}(\boldsymbol{K}))) regret, which we show scales as O(defflogT)\mathcal{O}(d_{\text{eff}}\log T), where deffd_{\text{eff}} is the effective dimension of the problem and is usually much smaller than O(T)\mathcal{O}(\sqrt{T}). The main drawback of second-order methods is their much higher O(t2)\mathcal{O}(t^2) space and time complexity. In this paper, we introduce kernel online Newton step (KONS), a new second-order KOCO method that also achieves O(defflogT)\mathcal{O}(d_{\text{eff}}\log T) regret. To address the computational complexity of second-order methods, we introduce a new matrix sketching algorithm for the kernel matrix Kt\boldsymbol{K}_t, and show that for a chosen parameter γ1\gamma \leq 1 our Sketched-KONS reduces the space and time complexity by a factor of γ2\gamma^2 to O(t2γ2)\mathcal{O}(t^2\gamma^2) space and time per iteration, while incurring only 1/γ1/\gamma times more regret

    A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption

    Get PDF
    We study the problem of optimizing a function under a \emph{budgeted number of evaluations}. We only assume that the function is \emph{locally} smooth around one of its global optima. The difficulty of optimization is measured in terms of 1) the amount of \emph{noise} bb of the function evaluation and 2) the local smoothness, dd, of the function. A smaller dd results in smaller optimization error. We come with a new, simple, and parameter-free approach. First, for all values of bb and dd, this approach recovers at least the state-of-the-art regret guarantees. Second, our approach additionally obtains these results while being \textit{agnostic} to the values of both bb and dd. This leads to the first algorithm that naturally adapts to an \textit{unknown} range of noise bb and leads to significant improvements in a moderate and low-noise regime. Third, our approach also obtains a remarkable improvement over the state-of-the-art SOO algorithm when the noise is very low which includes the case of optimization under deterministic feedback (b=0b=0). There, under our minimal local smoothness assumption, this improvement is of exponential magnitude and holds for a class of functions that covers the vast majority of functions that practitioners optimize (d=0d=0). We show that our algorithmic improvement is borne out in experiments as we empirically show faster convergence on common benchmarks

    Spectral Bandits for Smooth Graph Functions

    Get PDF
    International audienceSmooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of nodes evaluations

    Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

    Get PDF
    We study the online influence maximization problem in social networks under the independent cascade model. Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it. We address the challenges of (i) combinatorial action space, since the number of feasible influencer sets grows exponentially with the maximum number of influencers, and (ii) limited feedback, since only the influenced portion of the network is observed. Under a stochastic semi-bandit feedback, we propose and analyze IMLinUCB, a computationally efficient UCB-based algorithm. Our bounds on the cumulative regret are polynomial in all quantities of interest, achieve near-optimal dependence on the number of interactions and reflect the topology of the network and the activation probabilities of its edges, thereby giving insights on the problem complexity. To the best of our knowledge, these are the first such results. Our experiments show that in several representative graph topologies, the regret of IMLinUCB scales as suggested by our upper bounds. IMLinUCB permits linear generalization and thus is both statistically and computationally suitable for large-scale problems. Our experiments also show that IMLinUCB with linear generalization can lead to low regret in real-world online influence maximization.Comment: Compared with the previous version, this version has fixed a mistake. This version is also consistent with the NIPS camera-ready versio

    Distance Metric Learning for Conditional Anomaly Detection

    Get PDF
    International audienceAnomaly detection methods can be very useful in identifying unusual or interesting patterns in data. A recently proposed conditional anomaly detection framework extends anomaly detection to the problem of identifying anomalous patterns on a subset of attributes in the data. The anomaly always depends (is conditioned) on the value of remaining attributes. The work presented in this paper focuses on instance-based methods for detecting conditional anomalies. The methods depend heavily on the distance metric that lets us identify examples in the dataset that are most critical for detecting the anomaly. To optimize the performance of the anomaly detection methods we explore and study metric learning methods. We evaluate the quality of our methods on the Pneumonia PORT dataset by detecting unusual admission decisions for patients with the community-acquired pneumonia. The results of our metric learning methods show an improved detection performance over standard distance metrics, which is very promising for building automated anomaly detection systems for variety of intelligent monitoring applications

    Online combinatorial optimization with stochastic decision sets and adversarial losses

    Get PDF
    International audienceMost work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches
    corecore